Data Visualization with ggplot2
We start by loading the package ggplot2.
r{r load_ggplot2}
<!-- rnb-source-end -->
<!-- rnb-output-begin eyJkYXRhIjoiRXJyb3I6IGF0dGVtcHQgdG8gdXNlIHplcm8tbGVuZ3RoIHZhcmlhYmxlIG5hbWVcbiJ9 -->
Error: attempt to use zero-length variable name
<!-- rnb-output-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
## Plotting with ggplot2
ggplot2 is a plotting package that makes it simple to create complex plots from data in a data frame.
It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties.
Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatter plot.
This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.
ggplot2 functions like data in the ‘long’ format, i.e., a column for every dimension, and a row for every observation.
Well-structured data will save you lots of time when making figures with ggplot2
ggplot graphics are built step by step by adding new elements.
Adding layers in this fashion allows for extensive flexibility and customization of plots.
To build a ggplot, we will use the following basic template that can be used for different types of plots:
ggplot(data = , mapping = aes()) + () ```
use the ggplot() function and bind the plot to a specific data frame using the data argument
ggplot(data = variants, aes(x = POS, y = DP))

add ‘geoms’ – graphical representations of the data in the plot (points, lines, bars).
ggplot2 offers many different geoms; we will use some common ones today, including:
geom_point() for scatter plots, dot plots, etc.
geom_boxplot() for, well, boxplots!
geom_line() for trend lines, time series, etc.
To add a geom to the plot use the + operator. Because we have two continuous variables, let’s use geom_point() first:

The + in the ggplot2 package is particularly useful because it allows you to modify existing ggplot objects.
This means you can easily set up plot templates and conveniently explore different types of plots,
so the above plot can also be generated with code like this:

Building your plots iteratively
Building plots with ggplot2 is typically an iterative process.
We start by defining the dataset we’ll use, lay out the axes, and choose a geom

Then, we start modifying this plot to extract more information from it.
For instance, we can add transparency (alpha) to avoid overplotting:

ggplot(data = variants, aes(x = POS, y = DP)) + geom_point(alpha = 0.5)
We can also add colors for all the points:

Or to color each species in the plot differently,
you could use a vector as an input to the argument color. ggplot2 will provide a different color corresponding to different values in the vector.
Here is an example where we color with sample_id:

To make our plot more readable, we can add axis labels: x = “Base Pair Position” y = “Read Depth (DP)”

Faceting
ggplot2 has a special technique called faceting that allows the user to split one plot into multiple plots based on a factor included in the dataset. We will use it to split our mapping quality plot into three panels, one for each sample.
facet_grid()

This looks ok, but it would be easier to read if the plot facets were stacked vertically rather than horizontally.
The facet_grid geometry allows you to explicitly specify how you want your plots to be arranged via formula notation (rows ~ columns) a . can be used as a placeholder that indicates only one row or column).
ggplot(data = variants, aes(x = POS, y = MQ, color = sample_id)) +
geom_point() +
labs(x = "Base Pair Position",
y = "Mapping Quality (MQ)") +
facet_grid(. ~ sample_id)
Themes
Usually plots with white background look more readable when printed.
We can set the background to white using the function theme_bw().

Barplots
We can create barplots using the geom_bar() geom.
Let’s make a barplot showing the number of variants for each sample by type

LS0tCnRpdGxlOiAiJ1IgR2Vub21pY3M6IERhdGEgVmlzdWFsaXphdGlvbiB3aXRoIGdncGxvdDIiCm91dHB1dDogaHRtbF9ub3RlYm9vawplZGl0b3Jfb3B0aW9uczogCiAgY2h1bmtfb3V0cHV0X3R5cGU6IGlubGluZQotLS0KCltEYXRhIFZpc3VhbGl6YXRpb24gd2l0aCBnZ3Bsb3QyXShodHRwczovL2RhdGFjYXJwZW50cnkub3JnL2dlbm9taWNzLXItaW50cm8vMDUtZGF0YS12aXN1YWxpemF0aW9uL2luZGV4Lmh0bWwpCgoKV2Ugc3RhcnQgYnkgbG9hZGluZyB0aGUgcGFja2FnZSBnZ3Bsb3QyLgoKYGBge3IgbG9hZF9nZ3Bsb3QyfQpsaWJyYXJ5KGdncGxvdDIpCmBgYAoKIyMgUGxvdHRpbmcgd2l0aCBnZ3Bsb3QyCgpnZ3Bsb3QyIGlzIGEgcGxvdHRpbmcgcGFja2FnZSB0aGF0IG1ha2VzIGl0IHNpbXBsZSB0byBjcmVhdGUgY29tcGxleCBwbG90cyBmcm9tIGRhdGEgaW4gYSBkYXRhIGZyYW1lLiAKCkl0IHByb3ZpZGVzIGEgbW9yZSBwcm9ncmFtbWF0aWMgaW50ZXJmYWNlIGZvciBzcGVjaWZ5aW5nIHdoYXQgdmFyaWFibGVzIHRvIHBsb3QsIGhvdyB0aGV5IGFyZSBkaXNwbGF5ZWQsIGFuZCBnZW5lcmFsIHZpc3VhbCBwcm9wZXJ0aWVzLgoKVGhlcmVmb3JlLCB3ZSBvbmx5IG5lZWQgbWluaW1hbCBjaGFuZ2VzIGlmIHRoZSB1bmRlcmx5aW5nIGRhdGEgY2hhbmdlIG9yIGlmIHdlIGRlY2lkZSB0byBjaGFuZ2UgZnJvbSBhIGJhciBwbG90IHRvIGEgc2NhdHRlciBwbG90LiAKClRoaXMgaGVscHMgaW4gY3JlYXRpbmcgcHVibGljYXRpb24gcXVhbGl0eSBwbG90cyB3aXRoIG1pbmltYWwgYW1vdW50cyBvZiBhZGp1c3RtZW50cyBhbmQgdHdlYWtpbmcuCgpnZ3Bsb3QyIGZ1bmN0aW9ucyBsaWtlIGRhdGEgaW4gdGhlIOKAmGxvbmfigJkgZm9ybWF0LCBpLmUuLCBhIGNvbHVtbiBmb3IgZXZlcnkgZGltZW5zaW9uLCBhbmQgYSByb3cgZm9yIGV2ZXJ5IG9ic2VydmF0aW9uLiAKCldlbGwtc3RydWN0dXJlZCBkYXRhIHdpbGwgc2F2ZSB5b3UgbG90cyBvZiB0aW1lIHdoZW4gbWFraW5nIGZpZ3VyZXMgd2l0aCBnZ3Bsb3QyCgpnZ3Bsb3QgZ3JhcGhpY3MgYXJlIGJ1aWx0IHN0ZXAgYnkgc3RlcCBieSBhZGRpbmcgbmV3IGVsZW1lbnRzLiAKCkFkZGluZyBsYXllcnMgaW4gdGhpcyBmYXNoaW9uIGFsbG93cyBmb3IgZXh0ZW5zaXZlIGZsZXhpYmlsaXR5IGFuZCBjdXN0b21pemF0aW9uIG9mIHBsb3RzLgoKVG8gYnVpbGQgYSBnZ3Bsb3QsIHdlIHdpbGwgdXNlIHRoZSBmb2xsb3dpbmcgYmFzaWMgdGVtcGxhdGUgdGhhdCBjYW4gYmUgdXNlZCBmb3IgZGlmZmVyZW50IHR5cGVzIG9mIHBsb3RzOgoKYGBgCmdncGxvdChkYXRhID0gPERBVEE+LCBtYXBwaW5nID0gYWVzKDxNQVBQSU5HUz4pKSArICA8R0VPTV9GVU5DVElPTj4oKQpgYGAKCnVzZSB0aGUgZ2dwbG90KCkgZnVuY3Rpb24gYW5kIGJpbmQgdGhlIHBsb3QgdG8gYSBzcGVjaWZpYyBkYXRhIGZyYW1lIHVzaW5nIHRoZSBkYXRhIGFyZ3VtZW50CgpgYGB7ciBnZ3Bsb3RfMDF9CmdncGxvdChkYXRhID0gdmFyaWFudHMsIGFlcyh4ID0gUE9TLCB5ID0gRFApKQpgYGAKCmFkZCDigJhnZW9tc+KAmSDigJMgZ3JhcGhpY2FsIHJlcHJlc2VudGF0aW9ucyBvZiB0aGUgZGF0YSBpbiB0aGUgcGxvdCAocG9pbnRzLCBsaW5lcywgYmFycykuIAoKZ2dwbG90MiBvZmZlcnMgbWFueSBkaWZmZXJlbnQgZ2VvbXM7IHdlIHdpbGwgdXNlIHNvbWUgY29tbW9uIG9uZXMgdG9kYXksIGluY2x1ZGluZzoKCiogYGdlb21fcG9pbnQoKWAgZm9yIHNjYXR0ZXIgcGxvdHMsIGRvdCBwbG90cywgZXRjLgoqIGBnZW9tX2JveHBsb3QoKWAgZm9yLCB3ZWxsLCBib3hwbG90cyEKKiBgZ2VvbV9saW5lKClgIGZvciB0cmVuZCBsaW5lcywgdGltZSBzZXJpZXMsIGV0Yy4gIAoKVG8gYWRkIGEgZ2VvbSB0byB0aGUgcGxvdCB1c2UgdGhlICsgb3BlcmF0b3IuIEJlY2F1c2Ugd2UgaGF2ZSB0d28gY29udGludW91cyB2YXJpYWJsZXMsIGxldOKAmXMgdXNlIGdlb21fcG9pbnQoKSBmaXJzdDoKCmBgYHtyIGdncGxvdF9hZGRfZ2VvbX0KZ2dwbG90KGRhdGEgPSB2YXJpYW50cywgYWVzKHggPSBQT1MsIHkgPSBEUCkpICsKICBnZW9tX3BvaW50KCkKYGBgCgpUaGUgKyBpbiB0aGUgZ2dwbG90MiBwYWNrYWdlIGlzIHBhcnRpY3VsYXJseSB1c2VmdWwgYmVjYXVzZSBpdCBhbGxvd3MgeW91IHRvIG1vZGlmeSBleGlzdGluZyBnZ3Bsb3Qgb2JqZWN0cy4gCgpUaGlzIG1lYW5zIHlvdSBjYW4gZWFzaWx5IHNldCB1cCBwbG90IHRlbXBsYXRlcyBhbmQgY29udmVuaWVudGx5IGV4cGxvcmUgZGlmZmVyZW50IHR5cGVzIG9mIHBsb3RzLCAKCnNvIHRoZSBhYm92ZSBwbG90IGNhbiBhbHNvIGJlIGdlbmVyYXRlZCB3aXRoIGNvZGUgbGlrZSB0aGlzOgoKYGBge3J9CiMgQXNzaWduIHBsb3QgdG8gYSB2YXJpYWJsZQpjb3ZlcmFnZV9wbG90IDwtIGdncGxvdChkYXRhID0gdmFyaWFudHMsIGFlcyh4ID0gUE9TLCB5ID0gRFApKQoKIyBEcmF3IHRoZSBwbG90CmNvdmVyYWdlX3Bsb3QgKyAKICAgIGdlb21fcG9pbnQoKQpgYGAKCiMjIEJ1aWxkaW5nIHlvdXIgcGxvdHMgaXRlcmF0aXZlbHkKCkJ1aWxkaW5nIHBsb3RzIHdpdGggZ2dwbG90MiBpcyB0eXBpY2FsbHkgYW4gaXRlcmF0aXZlIHByb2Nlc3MuIAoKV2Ugc3RhcnQgYnkgZGVmaW5pbmcgdGhlIGRhdGFzZXQgd2XigJlsbCB1c2UsIGxheSBvdXQgdGhlIGF4ZXMsIGFuZCBjaG9vc2UgYSBnZW9tCgoKYGBge3IgZ2dwbG90X2J1aWxkX2l0fQogIGdncGxvdChkYXRhID0gdmFyaWFudHMsIGFlcyh4ID0gUE9TLCB5ID0gRFApKSsKICAgIGdlb21fcG9pbnQoKQpgYGAKClRoZW4sIHdlIHN0YXJ0IG1vZGlmeWluZyB0aGlzIHBsb3QgdG8gZXh0cmFjdCBtb3JlIGluZm9ybWF0aW9uIGZyb20gaXQuIAoKRm9yIGluc3RhbmNlLCB3ZSBjYW4gYWRkIHRyYW5zcGFyZW5jeSAoYWxwaGEpIHRvIGF2b2lkIG92ZXJwbG90dGluZzoKCmBgYHtyIGdncGxvdF9pbnRfMDJ9CiAgZ2dwbG90KGRhdGEgPSB2YXJpYW50cywgYWVzKHggPSBQT1MsIHkgPSBEUCkpICsKICAgIGdlb21fcG9pbnQoYWxwaGEgPSAwLjUpCmBgYAoKZ2dwbG90KGRhdGEgPSB2YXJpYW50cywgYWVzKHggPSBQT1MsIHkgPSBEUCkpICsKICAgIGdlb21fcG9pbnQoYWxwaGEgPSAwLjUpCgpXZSBjYW4gYWxzbyBhZGQgY29sb3JzIGZvciBhbGwgdGhlIHBvaW50czoKCmBgYHtyIGdncGxvdF9pdF8wM30KICBnZ3Bsb3QoZGF0YSA9IHZhcmlhbnRzLCBhZXMoeCA9IFBPUywgeSA9IERQKSkgKwogICAgZ2VvbV9wb2ludChhbHBoYSA9IDAuNSwgY29sb3IgPSAiYmx1ZSIpCmBgYAoKT3IgdG8gY29sb3IgZWFjaCBzcGVjaWVzIGluIHRoZSBwbG90IGRpZmZlcmVudGx5LCAKCnlvdSBjb3VsZCB1c2UgYSB2ZWN0b3IgYXMgYW4gaW5wdXQgdG8gdGhlIGFyZ3VtZW50IGNvbG9yLiBnZ3Bsb3QyIHdpbGwgcHJvdmlkZSBhIGRpZmZlcmVudCBjb2xvciBjb3JyZXNwb25kaW5nIHRvIGRpZmZlcmVudCB2YWx1ZXMgaW4gdGhlIHZlY3Rvci4gCgpIZXJlIGlzIGFuIGV4YW1wbGUgd2hlcmUgd2UgY29sb3Igd2l0aCBzYW1wbGVfaWQ6CgoKYGBge3IgZ2dwbG90X2l0XzA0fQogIGdncGxvdChkYXRhID0gdmFyaWFudHMsIGFlcyh4ID0gUE9TLCB5ID0gRFAsIGNvbG9yID0gc2FtcGxlX2lkKSkgKyAKICBnZW9tX3BvaW50KGFscGhhID0gMC41KQpgYGAKClRvIG1ha2Ugb3VyIHBsb3QgbW9yZSByZWFkYWJsZSwgd2UgY2FuIGFkZCBheGlzIGxhYmVsczoKeCA9ICJCYXNlIFBhaXIgUG9zaXRpb24iCnkgPSAiUmVhZCBEZXB0aCAoRFApIgoKYGBge3IgZ2dwbG90X2l0X2xhYmVsc30KICBnZ3Bsb3QoZGF0YSA9IHZhcmlhbnRzLCBhZXMoeCA9IFBPUywgeSA9IERQLCBjb2xvciA9IHNhbXBsZV9pZCkpICsgCiAgZ2VvbV9qaXR0ZXIoYWxwaGEgPSAwLjUpICsgCiAgbGFicyh4ID0gIkJhc2UgUGFpciBQb3NpdGlvbiIsIAogICAgICAgeSA9ICJSZWFkIERlcHRoIChEUCkiKQpgYGAKCiMjIEZhY2V0aW5nCgpnZ3Bsb3QyIGhhcyBhIHNwZWNpYWwgdGVjaG5pcXVlIGNhbGxlZCBmYWNldGluZyB0aGF0IGFsbG93cyB0aGUgdXNlciB0byBzcGxpdCBvbmUgcGxvdCBpbnRvIG11bHRpcGxlIHBsb3RzIApiYXNlZCBvbiBhIGZhY3RvciBpbmNsdWRlZCBpbiB0aGUgZGF0YXNldC4KV2Ugd2lsbCB1c2UgaXQgdG8gc3BsaXQgb3VyIG1hcHBpbmcgcXVhbGl0eSBwbG90IGludG8gdGhyZWUgcGFuZWxzLCBvbmUgZm9yIGVhY2ggc2FtcGxlLgoKYGZhY2V0X2dyaWQoKWAKCmBgYHtyIGdncGxvdF9pdF9mYWNldH0KZ2dwbG90KGRhdGEgPSB2YXJpYW50cywgYWVzKHggPSBQT1MsIHkgPSBNUSwgY29sb3IgPSBzYW1wbGVfaWQpKSArIAogZ2VvbV9wb2ludCgpICsgCiBsYWJzKHggPSAiQmFzZSBQYWlyIFBvc2l0aW9uIiwgCiAgICAgIHkgPSAiTWFwcGluZyBRdWFsaXR5IChNUSkiKSArIAogZmFjZXRfZ3JpZCguIH4gc2FtcGxlX2lkKQpgYGAKClRoaXMgbG9va3Mgb2ssIGJ1dCBpdCB3b3VsZCBiZSBlYXNpZXIgdG8gcmVhZCBpZiB0aGUgcGxvdCBmYWNldHMgd2VyZSBzdGFja2VkIHZlcnRpY2FsbHkgcmF0aGVyIHRoYW4gaG9yaXpvbnRhbGx5LgoKVGhlIGZhY2V0X2dyaWQgZ2VvbWV0cnkgYWxsb3dzIHlvdSB0byBleHBsaWNpdGx5IHNwZWNpZnkgaG93IHlvdSB3YW50IHlvdXIgcGxvdHMgdG8gYmUgYXJyYW5nZWQgdmlhIApmb3JtdWxhIG5vdGF0aW9uIChyb3dzIH4gY29sdW1ucykKYSAuIGNhbiBiZSB1c2VkIGFzIGEgcGxhY2Vob2xkZXIgdGhhdCBpbmRpY2F0ZXMgb25seSBvbmUgcm93IG9yIGNvbHVtbikuCgpgYGB7ciBnZ3Bsb3RfaXRfZmFjZXRfMDJ9CmdncGxvdChkYXRhID0gdmFyaWFudHMsIGFlcyh4ID0gUE9TLCB5ID0gTVEsIGNvbG9yID0gc2FtcGxlX2lkKSkgKyAKIGdlb21fcG9pbnQoKSArIAogbGFicyh4ID0gIkJhc2UgUGFpciBQb3NpdGlvbiIsIAogICAgICB5ID0gIk1hcHBpbmcgUXVhbGl0eSAoTVEpIikgKyAKIGZhY2V0X2dyaWQoLiB+IHNhbXBsZV9pZCkKYGBgCgoKIyBUaGVtZXMKClVzdWFsbHkgcGxvdHMgd2l0aCB3aGl0ZSBiYWNrZ3JvdW5kIGxvb2sgbW9yZSByZWFkYWJsZSB3aGVuIHByaW50ZWQuIAoKV2UgY2FuIHNldCB0aGUgYmFja2dyb3VuZCB0byB3aGl0ZSB1c2luZyB0aGUgZnVuY3Rpb24gdGhlbWVfYncoKS4gCgoKCmBgYHtyIGdncGxvdF9pdF90aGVtZXN9CmdncGxvdChkYXRhID0gdmFyaWFudHMsIGFlcyh4ID0gUE9TLCB5ID0gTVEsIGNvbG9yID0gc2FtcGxlX2lkKSkgKyAKICBnZW9tX3BvaW50KCkgKyAKICBsYWJzKHggPSAiQmFzZSBQYWlyIFBvc2l0aW9uIiwgCiAgICAgICB5ID0gIk1hcHBpbmcgUXVhbGl0eSAoTVEpIikgKyAKICBmYWNldF9ncmlkKHNhbXBsZV9pZCB+IC4pICsKICB0aGVtZV9idygpIApgYGAKCiMgQmFycGxvdHMKCldlIGNhbiBjcmVhdGUgYmFycGxvdHMgdXNpbmcgdGhlIGBnZW9tX2JhcigpYCBnZW9tLiAKCkxldOKAmXMgbWFrZSBhIGJhcnBsb3Qgc2hvd2luZyB0aGUgbnVtYmVyIG9mIHZhcmlhbnRzIGZvciBlYWNoIHNhbXBsZSBieSB0eXBlCgpgYGB7ciBnZ3Bsb3RfZ2VvbV9iYXJ9CiBnZ3Bsb3QodmFyaWFudHNfaW5kZWwsYWVzKHg9bXV0YXRpb25fdHlwZSxmaWxsPXNhbXBsZV9pZCkpKwogIGdlb21fYmFyKCkKCmBgYAoK